RNA-Seq Data Analysis ◾ 193
significant. The distance between the normal samples and the tumor sample is about 2
log2-fold change on the x-axis or 4 folds.
We can also use heatmap to cluster the most variable genes in the samples. We expect
that some samples may have similar pattern depending on the given condition (normal
or tumor). The following heatmap script will describe the relationships between samples
using hierarchical clustering:
install.packages(“gplots”)
library(“gplots”)
png(file=”heatmap1.png”)
logcountsNorm <- cpm(yNorm,log=TRUE)
var_genes <- apply(logcountsNorm, 1, var)
select_var <- names(sort(var_genes, decreasing=TRUE))[1:10]
highly_variable_lcpm <- logcountsNorm[select_var,]
mypalette <- brewer.pal(11,”RdYlBu”)
morecols <- colorRampPalette(mypalette)
col.con <- c(rep(“purple”,3),
rep(“orange”,3))[factor(sampleinfo$condition)]
heatmap.2(highly_variable_lcpm,
col=rev(morecols(50)),trace=”none”,
main=”Top 10 most variable genes”,
ColSideColors=col.con,scale=”row”,
margins=c(12,8),srtCol=45)
dev.off()
FIGURE 5.18 Multidimensional scaling (MDS) plot.